47 research outputs found
Pair HMM based gap statistics for re-evaluation of indels in alignments with affine gap penalties: Extended Version
Although computationally aligning sequence is a crucial step in the vast
majority of comparative genomics studies our understanding of alignment biases
still needs to be improved. To infer true structural or homologous regions
computational alignments need further evaluation. It has been shown that the
accuracy of aligned positions can drop substantially in particular around gaps.
Here we focus on re-evaluation of score-based alignments with affine gap
penalty costs. We exploit their relationships with pair hidden Markov models
and develop efficient algorithms by which to identify gaps which are
significant in terms of length and multiplicity. We evaluate our statistics
with respect to the well-established structural alignments from SABmark and
find that indel reliability substantially increases with their significance in
particular in worst-case twilight zone alignments. This points out that our
statistics can reliably complement other methods which mostly focus on the
reliability of match positions.Comment: 17 pages, 7 figure
Sparsification of RNA Structure Prediction Including Pseudoknots
Background: Although many RNA molecules contain pseudoknots, computational prediction of pseudoknottedRNA structure is still in its infancy due to high running time and space consumption implied by the dynamicprogramming formulations of the problem.Results: In this paper, we introduce sparsification to significantly speedup the dynamic programming approachesfor pseudoknotted RNA structure prediction, which also lower the space requirements. Although sparsification hasbeen applied to a number of RNA-related structure prediction problems in the past few years, we provide the firstapplication of sparsification to pseudoknotted RNA structure prediction specifically and to handling gappedfragments more generally - which has a much more complex recursive structure than other problems to whichsparsification has been applied. We analyse how to sparsify four pseudoknot structure prediction algorithms,among those the most general method available (the Rivas-Eddy algorithm) and the fastest one (Reeder-Giegerichalgorithm). In all algorithms the number of “candidate” substructures to be considered is reduced.Conclusions: Our experimental results on the sparsified Reeder-Giegerich algorithm suggest a linear speedup overthe unsparsified implementation
Fast prediction of RNA-RNA interaction
<p>Abstract</p> <p>Background</p> <p>Regulatory antisense RNAs are a class of ncRNAs that regulate gene expression by prohibiting the translation of an mRNA by establishing stable interactions with a target sequence. There is great demand for efficient computational methods to predict the specific interaction between an ncRNA and its target mRNA(s). There are a number of algorithms in the literature which can predict a variety of such interactions - unfortunately at a very high computational cost. Although some existing target prediction approaches are much faster, they are specialized for interactions with a single binding site.</p> <p>Methods</p> <p>In this paper we present a novel algorithm to accurately predict the minimum free energy structure of RNA-RNA interaction under the most general type of interactions studied in the literature. Moreover, we introduce a fast heuristic method to predict the specific (multiple) binding sites of two interacting RNAs.</p> <p>Results</p> <p>We verify the performance of our algorithms for joint structure and binding site prediction on a set of known interacting RNA pairs. Experimental results show our algorithms are highly accurate and outperform all competitive approaches.</p
Fast and scalable inference of multi-sample cancer lineages.
Somatic variants can be used as lineage markers for the phylogenetic reconstruction of cancer evolution. Since somatic phylogenetics is complicated by sample heterogeneity, novel specialized tree-building methods are required for cancer phylogeny reconstruction. We present LICHeE (Lineage Inference for Cancer Heterogeneity and Evolution), a novel method that automates the phylogenetic inference of cancer progression from multiple somatic samples. LICHeE uses variant allele frequencies of somatic single nucleotide variants obtained by deep sequencing to reconstruct multi-sample cell lineage trees and infer the subclonal composition of the samples. LICHeE is open source and available at http://viq854.github.io/lichee
Sparsification of RNA structure prediction including pseudoknots
<p>Abstract</p> <p>Background</p> <p>Although many RNA molecules contain pseudoknots, computational prediction of pseudoknotted RNA structure is still in its infancy due to high running time and space consumption implied by the dynamic programming formulations of the problem.</p> <p>Results</p> <p>In this paper, we introduce sparsification to significantly speedup the dynamic programming approaches for pseudoknotted RNA structure prediction, which also lower the space requirements. Although sparsification has been applied to a number of RNA-related structure prediction problems in the past few years, we provide the first application of sparsification to pseudoknotted RNA structure prediction specifically and to handling gapped fragments more generally - which has a much more complex recursive structure than other problems to which sparsification has been applied. We analyse how to sparsify four pseudoknot structure prediction algorithms, among those the most general method available (the Rivas-Eddy algorithm) and the fastest one (Reeder-Giegerich algorithm). In all algorithms the number of "candidate" substructures to be considered is reduced.</p> <p>Conclusions</p> <p>Our experimental results on the sparsified Reeder-Giegerich algorithm suggest a linear speedup over the unsparsified implementation.</p
A partition function algorithm for interacting nucleic acid strands
Recent interests, such as RNA interference and antisense RNA regulation, strongly motivate the problem of predicting whether two nucleic acid strands interact
Characterization of Coding Synonymous and Non-Synonymous Variants in ADAMTS13 Using Ex Vivo and In Silico Approaches
Synonymous variations, which are defined as codon substitutions that do not change the encoded amino acid, were previously thought to have no effect on the properties of the synthesized protein(s). However, mounting evidence shows that these “silent” variations can have a significant impact on protein expression and function and should no longer be considered “silent”. Here, the effects of six synonymous and six non-synonymous variations, previously found in the gene of ADAMTS13, the von Willebrand Factor (VWF) cleaving hemostatic protease, have been investigated using a variety of approaches. The ADAMTS13 mRNA and protein expression levels, as well as the conformation and activity of the variants have been compared to that of wild-type ADAMTS13. Interestingly, not only the non-synonymous variants but also the synonymous variants have been found to change the protein expression levels, conformation and function. Bioinformatic analysis of ADAMTS13 mRNA structure, amino acid conservation and codon usage allowed us to establish correlations between mRNA stability, RSCU, and intracellular protein expression. This study demonstrates that variants and more specifically, synonymous variants can have a substantial and definite effect on ADAMTS13 function and that bioinformatic analysis may allow development of predictive tools to identify variants that will have significant effects on the encoded protein
smyRNA: A Novel Ab Initio ncRNA Gene Finder
Background: Non-coding RNAs (ncRNAs) have important functional roles in the cell: for example, they regulate gene expression by means of establishing stable joint structures with target mRNAs via complementary sequence motifs. Sequence motifs are also important determinants of the structure of ncRNAs. Although ncRNAs are abundant, discovering novel ncRNAs on genome sequences has proven to be a hard task; in particular past attempts for ab initio ncRNA search mostly failed with the exception of tools that can identify micro RNAs. Methodology/Principal Findings: We present a very general ab initio ncRNA gene finder that exploits differential distributions of sequence motifs between ncRNAs and background genome sequences. Conclusions/Significance: Our method, once trained on a set of ncRNAs from a given species, can be applied to a genome sequences of other organisms to find not only ncRNAs homologous to those in the training set but also others that potentially belong to novel (and perhaps unknown) ncRNA families. Availability
Phylogenetic ctDNA analysis depicts early-stage lung cancer evolution.
The early detection of relapse following primary surgery for non-small-cell lung cancer and the characterization of emerging subclones, which seed metastatic sites, might offer new therapeutic approaches for limiting tumour recurrence. The ability to track the evolutionary dynamics of early-stage lung cancer non-invasively in circulating tumour DNA (ctDNA) has not yet been demonstrated. Here we use a tumour-specific phylogenetic approach to profile the ctDNA of the first 100 TRACERx (Tracking Non-Small-Cell Lung Cancer Evolution Through Therapy (Rx)) study participants, including one patient who was also recruited to the PEACE (Posthumous Evaluation of Advanced Cancer Environment) post-mortem study. We identify independent predictors of ctDNA release and analyse the tumour-volume detection limit. Through blinded profiling of postoperative plasma, we observe evidence of adjuvant chemotherapy resistance and identify patients who are very likely to experience recurrence of their lung cancer. Finally, we show that phylogenetic ctDNA profiling tracks the subclonal nature of lung cancer relapse and metastasis, providing a new approach for ctDNA-driven therapeutic studies